The Opioid Epidemic
Introduction
The Opioid Crisis is truly that - a crisis. Over the past 20 years, opioids have become a commonly used recreational drug. As people use these drugs, they become addicted. Opioid abuse often starts with a legitimate prescription as a treatment for pain. However, the addictive nature of the drug can cause people to seek out more opioids after their prescription is over. In small doses, opioids are effective painkillers that may make you feel drowsy. In large doses (abuse), opioids can result in slowed breathing and a slowed heart-rate. These effects can cause death - otherwise known as an overdose.
The amount of opioids prescribed began growing a lot in 2006 and peaked in 2012 with 255 million doses prescribed. Since 2012, overall doses prescribed has diminished to 168 million in 2018; however, that is still a huge number of opioid doses. Unfortunately, overdoses have also been on the rise and - unlike prescriptions - are continuing to rise (as can be seen in the figure below).
Image from the CDC - https://www.cdc.gov/nchs/data/databriefs/db356-h.pdf
The mechanism by which legitimate prescriptions may lead someone down the path to opioid abuse and potentially overdosing is one of addiction. When someone is legally prescribed opioids (potentially an excessive number of doses) they are subject to developing an addiction. If they do, then they may go searching for more opioids after their prescription runs out. The most common replacements are drugs like heroin and fentanyl. Heroin and fentanyl, which is over 100 times the strength of morphine - a notoriously powerful painkiller, are very strong opioids that can very easily lead to overdose even after just one use. It is also worth noting that even prescription opioids cause deaths, not just the super strong types.
For this project, we wanted to explore the relationship between opioid prescription rate and overdose rate and figure out which (if there are any) states that are disproportionately affected.
Data
Prescription Rate Data
The data we used to determine the prescription rate for each state comes from the CDC’s website. It includes data for each state - and summary data for the entire US - for overall opioid prescription doses and opioid prescription rate (per 100 persons) for the years 2006 - 2018. Although the data contained 12 years worth of prescribing information, we only used 2014 - 2018 because of the availability of matching overdose data.
Overdose Rate Data
Where data is from explain what it is
Prescription Rate vs. Overdose Rate
- Show maps
- Ask Questions about whether one causes the other, etc.
Prescription Rate Maps
2014
2015
2016
2017
2018
With these maps we can see how the prescription rate for opioids has changed from 2014 to 2018. It’s clear that policies instituted by the FDA in the late 2000s have come into effect, where nearly every state has lowered their prescription rate for opiates each year. Some interesting states to highlight include Alabama and West Virginia with prescription rates of 126 and 135 respectively per one hundred persons. These two states are on the higher end of the prescription rates in the US and later we can look into how this could play a roll in the overdose rates in both these states.
Overdose Rate Maps
2014
2015
2016
2017
2018
Regression
A very simple statistical analysis of whether or not two variables are correlated is to run a bivariate, linear regression that predicts how a 1 unit increase in the prescription rate of a state affects the overdose rate in that state. After running the model, we found that there is a statistically significant relationship between the two variables (p-value < .05). The actual linear relationship suggests that a 1 unit increase in the prescription rate will result in a .10475 unit increase in the overdose rate. Although this model is not perfect (we have almost certainly left out contributing variables that leads to some bias), the statistically significant, positive relationship between the prescription rate and the overdose rate at least show that the overdose rate is likely to go up with a greater prescription rate.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.52501 2.00731 9.229 <2e-16 ***
## prescription_rate 0.01915 0.02768 0.692 0.49
##
## Residual standard error: 9.043 on 248 degrees of freedom
## Multiple R-squared: 0.001926, Adjusted R-squared: -0.002099
## F-statistic: 0.4785 on 1 and 248 DF, p-value: 0.4897
K-Means Clustering
Text explaining why k-means
wanted to group state based on severity of the problem
Determining the Optimal K
While oftentimes you have to determine the optimal k for k-means clustering yourself, we wrote code to determine it for you. We determined the optimal k by selecting the largest mean silhouette coefficient. The silhouette coefficient is calculated using the mean intra-cluster distance and the mean nearest-cluster distance for all samples. The values range from -1 (samples are assigned to the wrong cluster) to 1 (best possible value). The function below runs k-mean clustering on a given k, and then calculates the mean silhouette coefficient using the silhouette function from the cluster package. We are providing k inputs from 2-5, and then searching for the largest value.
# Helper function to calculate mean silhouette coefficient
silhouette_score <- function(k){
km <- kmeans(full_data[, 2:3], centers = k, nstart = 20)
score <- cluster::silhouette(km$cluster, dist(full_data[, 2:3]))
mean(score[, 3])
}
# Follow up code to get the maximum silhouette coefficient
k <- 2:5
avg_sil <- sapply(k, silhouette_score)
optimal_k <- which(as.data.frame(avg_sil)$avg_sil == max(avg_sil)) + 1
Clustering for 2014-2018
2014
2015
2016
2017
2018
text about the clustering